Exploratory Data Analysis for HDB Resale Prices Dataset¶

Note: All exploratory data analysis done in this notebook will be based on the time period: 2013 to 2023

In this jupyter notebook, we will attempt to use data visualization tools to explore the relationships between features (both pre-defined by the original dataset and self-defined by us) and the corresponding HDB resale prices in Singapore. This data exploration process is done after Data Cleaning & Feature Engineering.

This is a complete and more detailed exploratory data analysis on the HDB Resale Price Dataset, of which only some features have been selected to be presented in the Final Report due to space constraints and importance of other aspects of the project.

After which, we will compare the trends across the two time periods in the Final Report.

1. Importing Necessary Libraries¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib as mpl

import seaborn as sns
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

import random

%matplotlib inline

2. Loading HDB Resale Prices Dataset¶

hdb_price_df_13_23: Pandas DataFrame object representing the features and information related to HDB flats in Singapore, for the time period — 2003 to 2012 (10 years)

hdb_price_df_13_23: Pandas DataFrame object representing the features and information related to HDB flats in Singapore, for the time period — 2013 to February 2023 (10 years & 2 months)

hdb_price_df: Pandas DataFrame object will be the combination of both DataFrames — hdb_price_df_13_23 & hdb_price_df_13_23

In [2]:
hdb_price_df_03_12 = pd.read_csv('hdb-price-with-features-2003to2012.csv').iloc[:, 1:]
hdb_price_df_13_23 = pd.read_csv('hdb-price-with-features-2013to2023.csv').iloc[:, 1:]

hdb_price_df_13_23.head()
Out[2]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease ... mall_count_within_1km mall_list_within_1km school_shortest_dist school_shortest_dist_name good_sch_count_within_1km good_sch_list_within_1km good_sch_score good_sch_score_sap_gep distance_to_cbd_km matured
0 2023-03-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 103.0 Premium Apartment 1998 74.333333 ... 1 ['Junction 8'] 0.214393 Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1
1 2022-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 102.0 Premium Apartment 1998 74.583333 ... 1 ['Junction 8'] 0.214393 Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1
2 2022-11-01 BISHAN 4 ROOM 289 BISHAN ST 24 16 TO 18 102.0 Premium Apartment 1998 74.666667 ... 1 ['Junction 8'] 0.214393 Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1
3 2022-06-01 BISHAN 5 ROOM 289 BISHAN ST 24 07 TO 09 121.0 Premium Apartment 1998 75.083333 ... 1 ['Junction 8'] 0.214393 Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1
4 2021-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 07 TO 09 102.0 Premium Apartment 1998 75.583333 ... 1 ['Junction 8'] 0.214393 Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1

5 rows × 30 columns

In [4]:
# new 'year' feature - just to check correctness of time frame 
hdb_price_df_03_12['year'] = pd.DatetimeIndex(hdb_price_df_03_12['month']).year
print("Range of Years in \x1B[3mhdb_price_df_13_23\x1B[0m DataFrame:", np.sort(hdb_price_df_03_12['year'].unique()))

hdb_price_df_13_23['year'] = pd.DatetimeIndex(hdb_price_df_13_23['month']).year
print("Range of Years in \x1B[3mhdb_price_df_13_23\x1B[0m DataFrame:", np.sort(hdb_price_df_13_23['year'].unique()))
Range of Years in hdb_price_df_13_23 DataFrame: [2003 2004 2005 2006 2007 2008 2009 2010 2011 2012]
Range of Years in hdb_price_df_13_23 DataFrame: [2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023]

3. Adjustment of HDB Resale Prices for Inflation (using CPI)¶

adjusted_price: indicates the adjusted HDB Resale Price after accounting for inflation

In [13]:
# Read CPI data
cpi = pd.read_csv('monthly cpi.csv')
cpi = cpi.set_index('Data Series')
cpi_data = pd.DataFrame(cpi.columns, columns=['month'])
cpi_data['CPI'] = pd.to_numeric(np.array(cpi.loc['All Items']))
cpi_data = cpi_data.iloc[1: , :]
cpi_data['month'] = pd.to_datetime(cpi_data['month'])

# adjust prices for inflation using CPI
hdb_price_df_03_12['month'] = pd.to_datetime(hdb_price_df_03_12['month'])
hdb_price_df_03_12 = pd.merge(hdb_price_df_03_12, cpi_data, how = 'left')
hdb_price_df_03_12['adjusted_price'] = (hdb_price_df_03_12['resale_price'] / hdb_price_df_03_12['CPI']) * 100

hdb_price_df_13_23['month'] = pd.to_datetime(hdb_price_df_13_23['month'])
hdb_price_df_13_23 = pd.merge(hdb_price_df_13_23, cpi_data, how = 'left')
hdb_price_df_13_23['adjusted_price'] = (hdb_price_df_13_23['resale_price'] / hdb_price_df_13_23['CPI']) * 100

hdb_price_df_13_23.head()
Out[13]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease ... school_shortest_dist_name good_sch_count_within_1km good_sch_list_within_1km good_sch_score good_sch_score_sap_gep distance_to_cbd_km matured year CPI adjusted_price
0 2023-03-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 103.0 Premium Apartment 1998 74.333333 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2023 NaN NaN
1 2022-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 102.0 Premium Apartment 1998 74.583333 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2022 111.186 764484.737287
2 2022-11-01 BISHAN 4 ROOM 289 BISHAN ST 24 16 TO 18 102.0 Premium Apartment 1998 74.666667 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2022 110.959 683135.212105
3 2022-06-01 BISHAN 5 ROOM 289 BISHAN ST 24 07 TO 09 121.0 Premium Apartment 1998 75.083333 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2022 108.671 809783.658934
4 2021-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 07 TO 09 102.0 Premium Apartment 1998 75.583333 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2021 104.439 579285.515947

5 rows × 33 columns

In [14]:
# also for the full dataset
hdb_price_df = pd.concat([hdb_price_df_03_12, hdb_price_df_13_23], axis=0)
print(hdb_price_df_03_12['month'].nunique() + hdb_price_df_13_23['month'].nunique() == hdb_price_df['month'].nunique())
True
In [15]:
hdb_price_df.head()
Out[15]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease ... school_shortest_dist_name good_sch_count_within_1km good_sch_list_within_1km good_sch_score good_sch_score_sap_gep distance_to_cbd_km matured year CPI adjusted_price
0 2012-09-01 BISHAN 4 ROOM 289 BISHAN ST 24 07 TO 09 100.0 Premium Apartment 1998 85.0 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2012 97.298 657773.027195
1 2012-03-01 BISHAN 4 ROOM 289 BISHAN ST 24 01 TO 05 102.0 Premium Apartment 1998 85.0 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2012 95.207 630205.762181
2 2012-03-01 BISHAN 5 ROOM 289 BISHAN ST 24 11 TO 15 121.0 Premium Apartment 1998 85.0 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2012 95.207 766750.343987
3 2012-02-01 BISHAN 5 ROOM 289 BISHAN ST 24 07 TO 09 121.0 Premium Apartment 1998 85.0 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2012 94.491 756685.821930
4 2011-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 07 TO 09 100.0 Premium Apartment 1998 86.0 ... Catholic High School 1 ['Catholic High School'] 3 2 7.635584 1 2011 93.877 637003.738935

5 rows × 33 columns

In [16]:
# sanity check of adjusted prices
random_index = random.randint(0, len(hdb_price_df)+1)
print((hdb_price_df.iloc[random_index]['adjusted_price'] in hdb_price_df_03_12['adjusted_price'].to_list()) or 
      (hdb_price_df.iloc[random_index]['adjusted_price'] in hdb_price_df_13_23['adjusted_price'].to_list()))
True
In [17]:
hdb_price_df_13_23.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 217191 entries, 0 to 217190
Data columns (total 33 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   month                      217191 non-null  datetime64[ns]
 1   town                       217191 non-null  object        
 2   flat_type                  217191 non-null  object        
 3   block                      217191 non-null  object        
 4   street_name                217191 non-null  object        
 5   storey_range               217191 non-null  object        
 6   floor_area_sqm             217191 non-null  float64       
 7   flat_model                 217191 non-null  object        
 8   lease_commence_date        217191 non-null  int64         
 9   remaining_lease            217191 non-null  float64       
 10  resale_price               217191 non-null  float64       
 11  address                    217191 non-null  object        
 12  latitude                   217191 non-null  float64       
 13  longitude                  217191 non-null  float64       
 14  shortest_dist_mrt_name     217191 non-null  object        
 15  shortest_dist_mrt          217191 non-null  float64       
 16  mrt_count_within_1km       217191 non-null  int64         
 17  mrt_list_within_1km        217191 non-null  object        
 18  shortest_dist_mall_name    217191 non-null  object        
 19  shortest_dist_mall         217191 non-null  float64       
 20  mall_count_within_1km      217191 non-null  int64         
 21  mall_list_within_1km       217191 non-null  object        
 22  school_shortest_dist       217191 non-null  float64       
 23  school_shortest_dist_name  217191 non-null  object        
 24  good_sch_count_within_1km  217191 non-null  int64         
 25  good_sch_list_within_1km   217191 non-null  object        
 26  good_sch_score             217191 non-null  int64         
 27  good_sch_score_sap_gep     217191 non-null  int64         
 28  distance_to_cbd_km         217191 non-null  float64       
 29  matured                    217191 non-null  int64         
 30  year                       217191 non-null  int64         
 31  CPI                        214784 non-null  float64       
 32  adjusted_price             214784 non-null  float64       
dtypes: datetime64[ns](1), float64(11), int64(8), object(13)
memory usage: 56.3+ MB
In [18]:
# column names
hdb_price_df_13_23.columns
Out[18]:
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
       'floor_area_sqm', 'flat_model', 'lease_commence_date',
       'remaining_lease', 'resale_price', 'address', 'latitude', 'longitude',
       'shortest_dist_mrt_name', 'shortest_dist_mrt', 'mrt_count_within_1km',
       'mrt_list_within_1km', 'shortest_dist_mall_name', 'shortest_dist_mall',
       'mall_count_within_1km', 'mall_list_within_1km', 'school_shortest_dist',
       'school_shortest_dist_name', 'good_sch_count_within_1km',
       'good_sch_list_within_1km', 'good_sch_score', 'good_sch_score_sap_gep',
       'distance_to_cbd_km', 'matured', 'year', 'CPI', 'adjusted_price'],
      dtype='object')

4. Exploring the Relationship between Physical Characteristics of a Flat and HDB Resale Prices¶

The table below summarizes the physical features of a flat that we will be conducting EDA with — | No. | Feature Name | Values | | :---| :----: | :---: | | 1 | flat_type | 1 Room, 2 Room, ... , Executive, Multi-Generation| | 2 | year | 2003, 2004, ... , 2012 | | 3 | town | Bukit Timah, Woodlands, Dhoby Ghaut, ... | | 4 | storey_range | 01 to 03, 03 to 05, ... , 40 to 42| | 5 | floor_area_sqm | | | 6 | blocknum | | | 7 | flat_model | Simplified, New Generation, ..., Terrace, Premium Maisonette | | 8 | lease_commence_date | 1966, 1967, ..., 2010, 2012| | 9 | remaining_lease | |

4.1 Flat Type¶

Distribution of Flat Types¶

In [19]:
print('Unique Flat Types from 2013 to 2023:\n',hdb_price_df_13_23['flat_type'].unique())
Unique Flat Types from 2013 to 2023:
 ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE' '2 ROOM' '1 ROOM'
 'MULTI-GENERATION']
In [21]:
flat_count_13_23 = hdb_price_df_13_23.groupby(['flat_type']).size().reset_index()
flat_count_13_23.rename(columns={0: 'count'}, inplace=True)
flat_count_13_23['flat_type_proportion'] = (flat_count_13_23['count']/flat_count_13_23['count'].sum())*100
flat_count_13_23['flat_type_proportion'] = flat_count_13_23['flat_type_proportion'].round(1)
flat_count_13_23
Out[21]:
flat_type count flat_type_proportion
0 1 ROOM 98 0.0
1 2 ROOM 3035 1.4
2 3 ROOM 54630 25.2
3 4 ROOM 89434 41.2
4 5 ROOM 53198 24.5
5 EXECUTIVE 16709 7.7
6 MULTI-GENERATION 87 0.0
In [23]:
fig, ax = plt.subplots(figsize=(13.33, 7.5), dpi=96)

# Grid
# ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
#ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# Set colors for bar graph
flat_type_proportion_highest = flat_count_13_23['flat_type_proportion'].max()
colors=['#193441']*len(flat_count_13_23)
for index, row in flat_count_13_23.iterrows():
    if row['flat_type_proportion'] == flat_type_proportion_highest:
        colors[index] = '#448c79'
    else:
        colors[index] = '#193441'

# Plot bars
bar1 = ax.bar(flat_count_13_23['flat_type'], flat_count_13_23['flat_type_proportion'], width=0.6, color=colors)

# Label x-axis and x ticks
ax.set_xlabel('', fontsize=12, labelpad=10) # No need for an axis label
ax.xaxis.set_label_position("bottom")

# Reformat y-axis
ax.set_ylabel('Percentage of All Flat Types (%)', fontsize=12, labelpad=10)
ax.yaxis.set_label_position("left")
ax.yaxis.set_major_formatter(lambda s, i : f'{s:,.0f}')
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.yaxis.set_tick_params(pad=2, labeltop=False, labelbottom=True, bottom=False, labelsize=12)

# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}%' for e in flat_count_13_23['flat_type_proportion']], padding=3, color='black', fontsize=12) 

# Remove  spines
ax.spines[['top','left','bottom']].set_visible(False)

# Make right spine thicker
ax.spines['right'].set_linewidth(1.1)

# Add rectangle box and line (beautification purposes only)
ax.plot([0.08, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.08,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.08, y=.93, s="2013 to 2023: Distribution of HDB Flat Types", transform=fig.transFigure, ha='left', fontsize=16, weight='bold', alpha=.8)
ax.text(x=0.08, y=.90, s="Bar Graph — Proportions of the different flat types available in Singapore HDB Market", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.85, wspace=None, hspace=None)

# Set a white background
fig.patch.set_facecolor('white')

Observations:

For the time period 2013 to 2023, 4 Room Flats take up the majority of flats in Singapore HDB market, accounting to more than 40% of all flat types. 3 Room and 4 Room takes up about 25% each. Followed by Executive Flat Types at 8%.

There are proportionately very few 1 room, 2 rooms and multi-generation flat types. As such, we will remove the following flat types for subsequent visualizations:

  • 1 Room
  • 2 Room
  • Multi-Generation

Comparing across both 10-year periods:

4 Room flats have increased in its proportion from 38% to 41% across, from time period 2003 to 2012 to time period 2013 to 2023. In the recent decade (i.e. 2013 to 2023), there is a more even proportion between 3 and 5 Room flats compared to the previous decade (i.e. 2003 to 2012). Executive Flat Types remains at a constant proportion of 8% of all Flat Types. 1 Room, 2 Room and Multi-Generation Flat Types remains almost negligble in terms of proportion across both decades. </font>

In [24]:
print('Unique flat types before removing:', hdb_price_df_03_12['flat_type'].unique(),'\n')
keep_flat_type = ['3 ROOM', '4 ROOM', '5 ROOM', 'EXECUTIVE']
hdb_price_df_03_12 = hdb_price_df_03_12[hdb_price_df_03_12['flat_type'].isin(keep_flat_type)]
print('Unique flat types after removing 1 rooms, 2 rooms and multi-generation flat types:',hdb_price_df_03_12['flat_type'].unique())

# do the same for 2013 to 2023
hdb_price_df_13_23 = hdb_price_df_13_23[hdb_price_df_13_23['flat_type'].isin(keep_flat_type)]

hdb_price_df = hdb_price_df[hdb_price_df['flat_type'].isin(keep_flat_type)]
Unique flat types before removing: ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE' '2 ROOM' '1 ROOM'
 'MULTI-GENERATION'] 

Unique flat types after removing 1 rooms, 2 rooms and multi-generation flat types: ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE']

Distribution of Resale Prices of Flat Types by Year¶

In [25]:
# create new feature flat_type_year
def get_year(x):
    return str(x)[:4]

hdb_price_df_13_23['flat_type_year'] = hdb_price_df_13_23['flat_type'] + ' - ' + hdb_price_df_13_23['month'].apply(lambda x: str(x)[:4])
hdb_price_df_13_23.head()
Out[25]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease ... good_sch_count_within_1km good_sch_list_within_1km good_sch_score good_sch_score_sap_gep distance_to_cbd_km matured year CPI adjusted_price flat_type_year
0 2023-03-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 103.0 Premium Apartment 1998 74.333333 ... 1 ['Catholic High School'] 3 2 7.635584 1 2023 NaN NaN 4 ROOM - 2023
1 2022-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 22 TO 24 102.0 Premium Apartment 1998 74.583333 ... 1 ['Catholic High School'] 3 2 7.635584 1 2022 111.186 764484.737287 4 ROOM - 2022
2 2022-11-01 BISHAN 4 ROOM 289 BISHAN ST 24 16 TO 18 102.0 Premium Apartment 1998 74.666667 ... 1 ['Catholic High School'] 3 2 7.635584 1 2022 110.959 683135.212105 4 ROOM - 2022
3 2022-06-01 BISHAN 5 ROOM 289 BISHAN ST 24 07 TO 09 121.0 Premium Apartment 1998 75.083333 ... 1 ['Catholic High School'] 3 2 7.635584 1 2022 108.671 809783.658934 5 ROOM - 2022
4 2021-12-01 BISHAN 4 ROOM 289 BISHAN ST 24 07 TO 09 102.0 Premium Apartment 1998 75.583333 ... 1 ['Catholic High School'] 3 2 7.635584 1 2021 104.439 579285.515947 4 ROOM - 2021

5 rows × 34 columns

In [27]:
# visualize using joypy library
import joypy

# time block: 2004 to 2013
fig, ax = joypy.joyplot(hdb_price_df_13_23,
                            by='flat_type_year',
                            column='adjusted_price',
                            figsize=(12,8),
                            title='Distribution of Resale Prices based on Flat Type for 2013 to 2023',
                            linewidth=0.05,
                            overlap=1.5,
                            alpha=0.8,
                            colormap=plt.cm.get_cmap('tab20',4))

ax[-1].set_xlim([0, 1400000])
ax[-1].set_xticklabels(['0', '200k', '400k', '600k', '800k', '1000k', '1200k', '1400k']) 
plt.xlabel('Resale Price SGD ($)', fontsize=14)

plt.show()

Observations from the above Ridgeline Plot:

  1. Consistent pattern throughout the years that HDB flats with more rooms tend to fetch higher resale prices, which is unsurprising.

  2. Looking at how prices vary for the same number of rooms (i.e. same flat type), HDB resale prices generally stay the same over the years for the same flat type from 2013 to 2023.

4.2 Town¶

2013 to 2023¶

In [28]:
hdb_price_df_13_23['year'] = pd.DatetimeIndex(hdb_price_df_13_23['month']).year # extract out year
town_prices_13_23 = hdb_price_df_13_23.groupby(['town'], as_index=False).agg({'adjusted_price': 'median'}).sort_values('adjusted_price', ascending=True).reset_index(drop=True)
town_prices_13_23['adjusted_price'] = round(town_prices_13_23['adjusted_price']/1000) # convert prices in terms of SGD1000 for easier visualization since prices are all in '000s
town_prices_13_23
Out[28]:
town adjusted_price
0 ANG MO KIO 370.0
1 YISHUN 371.0
2 BUKIT BATOK 384.0
3 BEDOK 389.0
4 GEYLANG 398.0
5 WOODLANDS 402.0
6 JURONG EAST 405.0
7 TOA PAYOH 414.0
8 SEMBAWANG 415.0
9 JURONG WEST 418.0
10 CHOA CHU KANG 420.0
11 HOUGANG 429.0
12 CLEMENTI 430.0
13 BUKIT PANJANG 434.0
14 SENGKANG 459.0
15 TAMPINES 468.0
16 SERANGOON 472.0
17 PUNGGOL 473.0
18 MARINE PARADE 474.0
19 KALLANG/WHAMPOA 494.0
20 PASIR RIS 496.0
21 CENTRAL AREA 542.0
22 QUEENSTOWN 593.0
23 BUKIT MERAH 602.0
24 BISHAN 613.0
25 BUKIT TIMAH 711.0
In [29]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

lowest_median_price, highest_median_price = town_prices_13_23['adjusted_price'].min(), town_prices_13_23['adjusted_price'].max()
colors=['#e3dfda']*len(town_prices_13_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_13_23.iterrows():
    if row['adjusted_price'] == lowest_median_price:
        colors[index] = '#e69798'
    if row['adjusted_price'] == highest_median_price:
        colors[index] = '#448c79'

# Grid
ax.grid(which="major", axis='x', alpha = 0.5,color='#DAD8D7', zorder=1)
ax.grid(which="major", axis='y', alpha=0.5, color='#DAD8D7', zorder=1)

# Plot bars
bar1=ax.barh(town_prices_13_23['town'], town_prices_13_23['adjusted_price'], color=colors)

# Label x-axis and x ticks
ax.set_xlabel("Median Resale Prices (SGD'000)", fontsize=12, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_13_23['adjusted_price']], padding=3, color='black', fontsize=10) 

# Remove spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

# Add rectangle box and line (beautification purposes only)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Median HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Horizontal Bar Graph — Comparing the prices of HDB across different Towns", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()

Observations:

  1. Bukit Timah is the most expensive town across 2013 to 2023. It has the highest median HDB Resale Price of SGD$711,000.

  2. Ang Mo Kio has the lowest median resale price of SGD$370,000, indicating that it is the least expensive town across 2013 to 2023.

Comparing across both 10-year periods:

Bukit Timah remains the most expensive town across both decades, from 2003 all the way to 2023 and it has increased in its median price. </font>

2003 vs 2023: Percentage Change of HDB Resale Prices by Town¶

In [30]:
town_prices_04_23 = hdb_price_df[hdb_price_df['year'].isin([2003,2023])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_04_23['change'] = town_prices_04_23.groupby('town')['adjusted_price'].apply(lambda x: x.pct_change()*100).round() # calculate percentage change between 2004 and 2023
town_prices_04_23 = town_prices_04_23[town_prices_04_23['change'].notnull()] 
town_prices_04_23 = town_prices_04_23.sort_values('change', ascending=True).reset_index(drop=True)

town_prices_04_23
Out[30]:
town year adjusted_price change
0 JURONG EAST 2023 411540.705764 29.0
1 PASIR RIS 2023 570033.304308 40.0
2 MARINE PARADE 2023 471287.377577 45.0
3 CHOA CHU KANG 2023 475775.828793 47.0
4 TAMPINES 2023 502706.536083 48.0
5 BEDOK 2023 386006.804492 49.0
6 SENGKANG 2023 511683.438513 51.0
7 BUKIT TIMAH 2023 585742.883561 53.0
8 HOUGANG 2023 480264.280008 54.0
9 SERANGOON 2023 473980.448307 55.0
10 SEMBAWANG 2023 483855.040980 58.0
11 BISHAN 2023 662495.399338 61.0
12 WOODLANDS 2023 475775.828793 61.0
13 GEYLANG 2023 382864.888642 64.0
14 BUKIT PANJANG 2023 475775.828793 65.0
15 ANG MO KIO 2023 403960.609352 69.0
16 JURONG WEST 2023 457822.023932 69.0
17 TOA PAYOH 2023 457822.023932 80.0
18 CENTRAL AREA 2023 505798.181280 81.0
19 YISHUN 2023 421914.414212 84.0
20 CLEMENTI 2023 500462.310475 99.0
21 KALLANG/WHAMPOA 2023 534125.694588 111.0
22 BUKIT BATOK 2023 538614.145803 114.0
23 BUKIT MERAH 2023 658904.638365 160.0
24 QUEENSTOWN 2023 610429.365243 174.0
In [31]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

smallest_price_change, largest_price_change = town_prices_04_23['change'].min(), town_prices_04_23['change'].max()
colors=['#e3dfda']*len(town_prices_04_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_04_23.iterrows():
    if row['change'] == smallest_price_change:
        colors[index] = '#e69798'
    if row['change'] == largest_price_change:
        colors[index] = '#448c79'

# Plot bars
bar1 = ax.barh(town_prices_04_23['town'], town_prices_04_23['change'], color=colors)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=13, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_04_23['change']], padding=3, color='black', fontsize=11) 

# Remove the spines and make the right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)


# Add in red line and rectangle on top
ax.plot([0, 0.9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0,0.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0, y=.93, s="2003 vs 2023: Percentage Change of HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0, y=.91, s="Horizontal Bar Graph — How HDB resale prices changed in the last two decades, based on the town's median price", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)


plt.show()

Observations:

Comparing the percentage change in median HDB resale prices between 2003 and this year, 2023, all the towns in Singapore has generally increased in its median HDB resale prices.

Queenstown has the highest percentage change in its resale prices (157% which is more than double) while Jurong East has the lowest increase in resale prices, at only 29%.

Past Two Years (2022 and 2023): Percentage Change of Resale Prices by Town¶

In [32]:
town_prices_22_23 = hdb_price_df[hdb_price_df['year'].isin([2022,2023])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_22_23
Out[32]:
town year adjusted_price
0 ANG MO KIO 2022 395947.680293
1 ANG MO KIO 2023 403960.609352
2 BEDOK 2022 410962.607810
3 BEDOK 2023 386006.804492
4 BISHAN 2022 643169.539491
5 BISHAN 2023 662495.399338
6 BUKIT BATOK 2022 450218.677643
7 BUKIT BATOK 2023 538614.145803
8 BUKIT MERAH 2022 624537.762297
9 BUKIT MERAH 2023 658904.638365
10 BUKIT PANJANG 2022 467684.780458
11 BUKIT PANJANG 2023 475775.828793
12 BUKIT TIMAH 2022 659954.263717
13 BUKIT TIMAH 2023 585742.883561
14 CENTRAL AREA 2022 555236.976962
15 CENTRAL AREA 2023 505798.181280
16 CHOA CHU KANG 2022 478633.431848
17 CHOA CHU KANG 2023 475775.828793
18 CLEMENTI 2022 489400.846482
19 CLEMENTI 2023 500462.310475
20 GEYLANG 2022 419613.518655
21 GEYLANG 2023 382864.888642
22 HOUGANG 2022 471274.889205
23 HOUGANG 2023 480264.280008
24 JURONG EAST 2022 418703.294103
25 JURONG EAST 2023 411540.705764
26 JURONG WEST 2022 464151.158107
27 JURONG WEST 2023 457822.023932
28 KALLANG/WHAMPOA 2022 624148.542663
29 KALLANG/WHAMPOA 2023 534125.694588
30 MARINE PARADE 2022 449443.230573
31 MARINE PARADE 2023 471287.377577
32 PASIR RIS 2022 560912.222812
33 PASIR RIS 2023 570033.304308
34 PUNGGOL 2022 511008.800389
35 PUNGGOL 2023 520660.340943
36 QUEENSTOWN 2022 644864.420804
37 QUEENSTOWN 2023 610429.365243
38 SEMBAWANG 2022 484866.567838
39 SEMBAWANG 2023 483855.040980
40 SENGKANG 2022 501448.107151
41 SENGKANG 2023 511683.438513
42 SERANGOON 2022 495746.454346
43 SERANGOON 2023 473980.448307
44 TAMPINES 2022 502917.480831
45 TAMPINES 2023 502706.536083
46 TOA PAYOH 2022 489653.518975
47 TOA PAYOH 2023 457822.023932
48 WOODLANDS 2022 466314.696363
49 WOODLANDS 2023 475775.828793
50 YISHUN 2022 427030.053426
51 YISHUN 2023 421914.414212
In [33]:
town_prices_22_23 = hdb_price_df[hdb_price_df['year'].isin([2021,2022])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_22_23['change'] = town_prices_22_23.groupby('town')['adjusted_price'].apply(lambda x: x.pct_change()*100).round() # calculate percentage change between 2004 and 2023
town_prices_22_23 = town_prices_22_23[town_prices_22_23['change'].notnull()] 
town_prices_22_23 = town_prices_22_23.sort_values('change', ascending=True).reset_index(drop=True).reset_index()
town_prices_22_23
Out[33]:
index town year adjusted_price change
0 0 CENTRAL AREA 2022 555236.976962 -16.0
1 1 GEYLANG 2022 419613.518655 -13.0
2 2 QUEENSTOWN 2022 644864.420804 -7.0
3 3 CLEMENTI 2022 489400.846482 -7.0
4 4 MARINE PARADE 2022 449443.230573 -6.0
5 5 JURONG EAST 2022 418703.294103 -5.0
6 6 BEDOK 2022 410962.607810 -4.0
7 7 BUKIT TIMAH 2022 659954.263717 -4.0
8 8 SERANGOON 2022 495746.454346 -1.0
9 9 BUKIT MERAH 2022 624537.762297 -1.0
10 10 BUKIT PANJANG 2022 467684.780458 -1.0
11 11 TAMPINES 2022 502917.480831 1.0
12 12 YISHUN 2022 427030.053426 1.0
13 13 HOUGANG 2022 471274.889205 1.0
14 14 CHOA CHU KANG 2022 478633.431848 2.0
15 15 TOA PAYOH 2022 489653.518975 2.0
16 16 PASIR RIS 2022 560912.222812 3.0
17 17 BISHAN 2022 643169.539491 3.0
18 18 JURONG WEST 2022 464151.158107 4.0
19 19 PUNGGOL 2022 511008.800389 4.0
20 20 SENGKANG 2022 501448.107151 5.0
21 21 WOODLANDS 2022 466314.696363 6.0
22 22 ANG MO KIO 2022 395947.680293 7.0
23 23 SEMBAWANG 2022 484866.567838 8.0
24 24 BUKIT BATOK 2022 450218.677643 8.0
25 25 KALLANG/WHAMPOA 2022 624148.542663 19.0
In [34]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# smallest_price_change, largest_price_change = town_prices_22_23['change'].min(), town_prices_22_23['change'].max()
colors=['#e3dfda']*len(town_prices_22_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_22_23.iterrows():
    if row['change'] < 0:
        colors[index] = '#e69798'
    if row['change'] > 0:
        colors[index] = '#448c79'

# Plot bars
bar1 = ax.barh(town_prices_22_23['town'], town_prices_22_23['change'], color=colors)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=13, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_22_23['change']], padding=3, color='black', fontsize=11) 

# Remove the spines
ax.spines[['top','left','bottom']].set_visible(False)

# Make the right spine thicker
ax.spines['right'].set_linewidth(1.1)

# Add in red line and rectangle on top
ax.plot([0, 0.9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0,0.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0, y=.93, s="2022 vs 2023: Percentage Change of HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0, y=.91, s="Horizontal Bar Graph — How HDB resale prices changed in the past two years, based on the town's median price", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()

Observations:

The changes are not very significant from 2022 to 2023, generally increase/decrease by about 10%.

Prices have dropped the most (16%) in Central Area while on the other hand, prices in Kallang/Whampoa have increased the most, by 19%.

Could it be because these areas have older HDB meaning their lease are now shorter? As shown below, it seems that places like Punggol, and Sengkang, tend to have later lease commence date, as they were developed later, which might have led to their slight increase in prices, while places like Toa Payoh and Central Area, tend to have older lease commence date.

In [35]:
hdb_price_df[hdb_price_df['year'].isin([2022,2023])].groupby('town')['lease_commence_date'].median().sort_values()
Out[35]:
town
MARINE PARADE      1975.0
ANG MO KIO         1980.0
BEDOK              1980.0
CLEMENTI           1981.0
GEYLANG            1982.0
TOA PAYOH          1984.0
CENTRAL AREA       1984.0
JURONG EAST        1984.0
SERANGOON          1986.0
BUKIT BATOK        1987.0
YISHUN             1988.0
BISHAN             1988.0
TAMPINES           1988.0
BUKIT TIMAH        1988.0
HOUGANG            1991.0
PASIR RIS          1993.0
BUKIT MERAH        1996.0
KALLANG/WHAMPOA    1996.0
JURONG WEST        1997.0
WOODLANDS          1998.0
BUKIT PANJANG      1999.0
CHOA CHU KANG      1999.0
QUEENSTOWN         2003.0
SEMBAWANG          2004.0
SENGKANG           2012.0
PUNGGOL            2014.0
Name: lease_commence_date, dtype: float64

4.3 Storey Range¶

In [36]:
# Storey Prices for 2013 to 2023
storey_13_23 = hdb_price_df_13_23.groupby('storey_range')['adjusted_price'].median().reset_index().sort_values(by='storey_range')
storey_13_23['storey_rank'] = storey_13_23['storey_range'].astype('category').cat.codes + 1 # label encode storey -> give each storey range a rank
storey_13_23
Out[36]:
storey_range adjusted_price storey_rank
0 01 TO 03 3.984660e+05 1
1 04 TO 06 4.151394e+05 2
2 07 TO 09 4.289781e+05 3
3 10 TO 12 4.398682e+05 4
4 13 TO 15 4.781860e+05 5
5 16 TO 18 5.133940e+05 6
6 19 TO 21 6.043963e+05 7
7 22 TO 24 6.381870e+05 8
8 25 TO 27 7.081029e+05 9
9 28 TO 30 7.865138e+05 10
10 31 TO 33 8.152569e+05 11
11 34 TO 36 8.361254e+05 12
12 37 TO 39 8.441096e+05 13
13 40 TO 42 8.885473e+05 14
14 43 TO 45 9.452847e+05 15
15 46 TO 48 1.031838e+06 16
16 49 TO 51 1.097620e+06 17
In [38]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Set colors
colors = ['#193441']*len(storey_13_23)
for index, row in storey_13_23.iterrows():
    if row['adjusted_price'] > 700000:
        colors[index] = '#e69798'

scatter1=sns.scatterplot(x=storey_13_23['storey_range'], y=storey_13_23['adjusted_price'], s=storey_13_23['storey_rank'].astype('int')*30, sizes=(20,200),color=colors, edgecolors='w', alpha=0.5, ax=ax)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# Label data points
for i, row in storey_13_23.iterrows():
    # offset = row['storey_rank'] / 50
    label = str(round(row['adjusted_price']/1000)) + 'K'
    plt.annotate(label, xy=(row['storey_range'], row['adjusted_price']), ha='center', va='bottom', xytext=(0, 14), textcoords='offset points', fontsize=11)

# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=12, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
plt.xticks(rotation=45, fontsize=10)

# Reformat y-axis
ylabels = ['{:,.0f}'.format(x) + 'K' for x in scatter1.get_yticks()/1000]
scatter1.set_yticklabels(ylabels)
scatter1.set_ylim([300000,1200000])
scatter1.set_ylabel('Resale Price SGD ($)', size=15)
scatter1.set_xlabel('Storey', size=15)

# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Storey and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Bubble Scatterplot — How HDB resale prices are associated with floor level, based on the median price of each storey range", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/628486241.py:28: UserWarning: FixedFormatter should only be used together with FixedLocator
  scatter1.set_yticklabels(ylabels)

Observations:

General trend that as floor level increases, the median HDB resale prices are also associated with an increase.

Unlike the dataset from 2003 to 2012, the X-axis labels are already well-labelled in the 2013 to 2023 dataset, hence there is no need for any adjustments.

Comparing across both 10-year periods:

For 2003 to 2012, at the high floors, the HDB Resale Prices tend to plateau and does not increase as much compared to the middle floors. However, for 2013 to 2023, HDB Resale Prices continuously increase as floor level increases, regardless of how high the floor gets. In fact, above 37th floor, the rate of increase of prices is still increasing, and HDB Flats in those floors are fetching close to or more than SGD$1 Million. </font>

4.4 Floor Area¶

In [39]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.regplot(x='floor_area_sqm', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price SGD ($)', size=15)
p.set_xlabel('Floor Area (Square Meters)', size=15)

# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Floor Area and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — How HDB resale prices are associated with floor area (in sqm)", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/4232059774.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

As floor area of a HDB resale flat increases, its resale price is associated with an increase as well. Majority of transacted HDB resale flats have a floor area of between 50 and 200 square metres.

Comparing across both 10-year periods:

Not much visible difference between the steepness of graphs showing the relationship between Floor Area and HDB Resale Prices across both decades.

</font>

Using Median Prices for each Floor Area¶

In [41]:
floor_area_13_23 = hdb_price_df_13_23.groupby(['floor_area_sqm'], as_index=False).agg({'adjusted_price':'median'})
floor_area_13_23 
Out[41]:
floor_area_sqm adjusted_price
0 49.0 3.138981e+05
1 51.0 2.483005e+05
2 52.0 2.527525e+05
3 53.0 2.531492e+05
4 54.0 2.964794e+05
... ... ...
154 243.0 1.034958e+06
155 249.0 1.062119e+06
156 259.0 1.163332e+06
157 266.0 1.050334e+06
158 280.0 1.065873e+06

159 rows × 2 columns

In [42]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.regplot(x='floor_area_sqm', y='adjusted_price', data=floor_area_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price SGD ($)', size=15)
p.set_xlabel('Floor Area (Square Meters)', size=15)

# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Floor Area and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — using Median HDB Resale Price for each Floor Area", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/574426306.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)
In [43]:
# what are those HDBs which have bigger than 200 sqm of floor area
above_200_sqm_df = hdb_price_df_13_23[hdb_price_df_13_23['floor_area_sqm'] > 200].reset_index()
above_200_sqm_df.groupby('flat_model')['index'].nunique()
Out[43]:
flat_model
Maisonette             2
Premium Maisonette     2
Terrace               10
Name: index, dtype: int64

Observations:

Trend is similar to without taking median prices for each floor area; previous visualization.

Flat Models of HDBs which have larger than 200 sqm of floor area are mostly Terrace, which is not surprising.

Comparing across both 10-year periods:

Not much visible difference between the steepness of graphs showing the relationship between Floor Area and HDB Resale Prices across both decades.

</font>

4.5 Block Number¶

3 digit system was introduced in the 1970s, with the 1st digit representing a neighbourhood in a town. So for e.g., AMK neighbourhood 1 starts with 101, and AMK neighbourhood 2 starts with 201. So first digit was separated from last 2 digits and plotted separately

In [44]:
import re

# Block Number Prices
get_num = lambda x: int(re.findall("\d+", x)[0])
hdb_price_df_13_23['blocknum'] = hdb_price_df_13_23['block'].apply(get_num) # get only digits from block number
tmp = hdb_price_df_13_23[hdb_price_df_13_23['blocknum'] > 99] # get only blocks that use 3-digit numbering system
tmp = tmp.groupby('blocknum')['adjusted_price'].median().reset_index()

fig, ax = plt.subplots(figsize=(20, 13), dpi=96)

ax1 = plt.subplot(121)
a=sns.scatterplot(x=tmp['blocknum'].apply(lambda x: int(str(x)[0])), y=tmp['adjusted_price'], color='#193441', edgecolors='w', alpha=0.9)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_yticks()/1000]
ax1.set_yticklabels(ylabels)
ax1.set_ylabel('Resale Price ($SGD)', size=15), ax1.set_xlabel('Neighbourhood Number', size=15)

ax2 = plt.subplot(122)
b=sns.scatterplot(x=tmp['blocknum'].apply(lambda x: int(str(x)[1:])), y=tmp['adjusted_price'], color = '#e69798',edgecolors='w', alpha=0.9)
ax2.set_yticklabels(ylabels)
ax2.set_ylabel('', size=15)
ax2.set_xlabel('Block Number', size=15)

# Add in line and rectangle on top
plt.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax1.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
plt.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Neighbourhood/Block Number and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
plt.text(x=0.05, y=.91, s="Scatterplot — How (if any) HDB Resale Price is associated with Neighbourhood/Block Number", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2226950048.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax1.set_yticklabels(ylabels)
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2226950048.py:19: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax2.set_yticklabels(ylabels)

Observations:

Neighbourhood/Block Number does not seem to have any association with HDB Resale Prices.

Comparing across both 10-year periods:

Not much visible difference in the scatter plots across both decades.

</font>

4.6 Flat Model¶

In [47]:
fig, ax = plt.subplots(figsize=(12, 14), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.violinplot(x='adjusted_price', 
                 y='flat_model', 
                 data=hdb_price_df_13_23, 
                 width=1,
                 orient='h', # horizontal plot
                 order=hdb_price_df_13_23.groupby('flat_model')['adjusted_price'].median().sort_values().reset_index()['flat_model'].tolist(),
                 palette='Set3')

# Label x-axis and x ticks
p.set_xlabel("Resale Price (SGD$)", fontsize=14, labelpad=10) 
xlabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_xticks()/1000]
p.set_xticklabels(xlabels)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
p.set_xlim([0,1500000])

# Reformat y-axis
# ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
# p.set_yticklabels(ylabels)
p.set_ylabel("Flat Model", size=15)
# p.set_ylim([50000,1300000])

# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.94, s="2013 to 2023: Relationship between Flat Models and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.92, s="Violin Boxplot — Distribution of HDB Resale Prices across different Flat Models", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/292884573.py:18: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_xticklabels(xlabels)

Observations:

Flat Models like Premium Apartment Loft, Type S1 and Type S2 have relatively high median resale prices, compared to Simplified, New Generation and Standard flat models. Improved Flat Model has the largest range.

Comparing across both 10-year periods:

Not much visible difference between the violin plot distributions showing the relationship between Flat Model and HDB Resale Prices across both decades.

</font>

In [49]:
x = np.array((hdb_price_df_13_23['adjusted_price'] < 0).to_list())
np.unique(x)
Out[49]:
array([False])

4.7 Lease Commence Date (in Years)¶

In [50]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

a=sns.boxplot(y='lease_commence_date', x='adjusted_price', data=hdb_price_df_13_23, width=1, orient='h', flierprops = dict(markerfacecolor = 'red', markersize = 0.1, linestyle='none'), linewidth=0.4)
ax.set_xlabel('Resale Price ($SGD)', size=15)
xlabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_xticks()/1000]
ax.set_xticklabels(xlabels)
p.set_xlim([0,1300000])

ax.set_ylabel('Lease Commence Year', size=15)


# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Lease Commence (Year) and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Boxplot — Distribution of Resale Prices amoung HDB Flats with different years of Lease Commencement", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/4102906681.py:10: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xlabels)

Observations:

HDB Flats with earlier Lease Commence Dates generally fetch lower Median Resale Prices compared to those that have more recent Lease Commence Dates. However, HDB Flats with 2011 as their Lease Commence Date has the highest Median Resale Prices, even HDB Flats that are newer (i.e. Lease Commence Date later than 2011) have lower Median Resale Prices.

Comparing across both 10-year periods:

Not much visible difference between the boxplot distributions showing the relationship between Lease Commence and HDB Resale Prices across both decades.

</font>

4.8 Remaining Lease (in Years)¶

In [ ]:
hdb_price_df['remaining_lease'].unique()
Out[ ]:
array([ 85.        ,  86.        ,  87.        ,  88.        ,
        89.        ,  90.        ,  91.        ,  92.        ,
        93.        ,  94.        ,  70.        ,  71.        ,
        72.        ,  73.        ,  74.        ,  75.        ,
        76.        ,  77.        ,  78.        ,  79.        ,
        80.        ,  81.        ,  82.        ,  83.        ,
        84.        ,  65.        ,  66.        ,  67.        ,
        68.        ,  69.        ,  95.        ,  96.        ,
        54.        ,  55.        ,  56.        ,  57.        ,
        58.        ,  59.        ,  60.        ,  61.        ,
        62.        ,  63.        ,  64.        ,  97.        ,
        98.        ,  99.        , 100.        , 101.        ,
        74.33333333,  74.58333333,  74.66666667,  75.08333333,
        75.58333333,  76.5       ,  76.66666667,  76.75      ,
        79.08333333,  80.16666667,  59.41666667,  60.16666667,
        60.33333333,  60.91666667,  60.83333333,  61.08333333,
        61.25      ,  61.41666667,  61.66666667,  62.33333333,
        62.5       ,  62.75      ,  63.33333333,  63.5       ,
        63.58333333,  64.33333333,  65.16666667,  65.33333333,
        65.5       ,  62.91666667,  63.25      ,  63.66666667,
        64.16666667,  66.83333333,  67.33333333,  67.58333333,
        94.75      ,  94.91666667,  64.08333333,  64.66666667,
        64.75      ,  64.91666667,  65.41666667,  65.66666667,
        65.58333333,  66.33333333,  66.58333333,  68.33333333,
        61.91666667,  63.41666667,  63.91666667,  64.41666667,
        65.25      ,  65.83333333,  66.5       ,  66.91666667,
        60.66666667,  66.66666667,  64.25      ,  64.83333333,
        67.5       ,  67.75      ,  67.83333333,  67.91666667,
        68.16666667,  68.41666667,  68.83333333,  69.41666667,
        63.75      ,  63.83333333,  64.58333333,  67.08333333,
        67.25      ,  67.16666667,  68.5       ,  68.58333333,
        68.66666667,  59.16666667,  59.91666667,  60.08333333,
        63.08333333,  64.5       ,  54.25      ,  55.5       ,
        55.58333333,  56.08333333,  56.25      ,  56.33333333,
        56.66666667,  58.33333333,  59.66666667,  59.75      ,
        62.41666667,  62.58333333,  63.16666667,  57.66666667,
        57.75      ,  58.16666667,  58.25      ,  58.75      ,
        58.66666667,  59.58333333,  60.41666667,  61.33333333,
        61.83333333,  62.16666667,  62.66666667,  66.16666667,
        66.75      ,  67.41666667,  68.75      ,  54.91666667,
        55.25      ,  55.66666667,  56.16666667,  57.16666667,
        58.91666667,  60.58333333,  62.25      ,  65.08333333,
        66.41666667,  69.08333333,  75.66666667,  76.16666667,
        76.41666667,  76.83333333,  77.08333333,  77.16666667,
        77.33333333,  77.41666667,  78.25      ,  78.5       ,
        78.58333333,  78.75      ,  79.41666667,  79.33333333,
        79.75      ,  80.25      ,  81.08333333,  81.25      ,
        81.33333333,  81.41666667,  81.5       ,  80.08333333,
        80.41666667,  81.16666667,  82.16666667,  82.25      ,
        82.33333333,  82.66666667,  83.75      ,  83.83333333,
        84.33333333,  84.5       ,  84.75      ,  84.83333333,
        84.91666667,  85.08333333,  85.58333333,  85.75      ,
        85.83333333,  75.75      ,  76.08333333,  76.25      ,
        77.5       ,  77.66666667,  77.83333333,  77.91666667,
        78.33333333,  79.25      ,  80.75      ,  81.58333333,
        69.33333333,  69.75      ,  69.91666667,  70.16666667,
        70.33333333,  71.91666667,  71.83333333,  72.83333333,
        73.75      ,  74.16666667,  74.41666667,  69.25      ,
        70.91666667,  73.66666667,  75.91666667,  75.83333333,
        77.25      ,  77.75      ,  79.58333333,  79.91666667,
        81.83333333,  61.16666667,  66.25      ,  69.83333333,
        92.08333333,  92.58333333,  92.66666667,  92.91666667,
        93.16666667,  93.58333333,  93.83333333,  93.91666667,
        94.16666667,  94.33333333,  94.25      ,  94.41666667,
        94.5       ,  94.58333333,  94.66666667,  95.16666667,
        95.25      ,  92.33333333,  92.41666667,  92.75      ,
        93.08333333,  94.08333333,  76.91666667,  78.08333333,
        78.16666667,  78.83333333,  78.91666667,  79.16666667,
        79.66666667,  68.08333333,  65.91666667,  69.58333333,
        70.08333333,  70.83333333,  71.16666667,  71.33333333,
        62.83333333,  87.83333333,  87.91666667,  88.08333333,
        88.25      ,  89.08333333,  89.33333333,  89.5       ,
        89.75      ,  90.33333333,  90.41666667,  91.33333333,
        92.16666667,  93.25      ,  93.41666667,  93.66666667,
        55.41666667,  55.91666667,  56.41666667,  56.5       ,
        56.58333333,  57.91666667,  59.33333333,  59.25      ,
        55.08333333,  55.75      ,  55.83333333,  58.58333333,
        73.91666667,  74.5       ,  74.83333333,  75.33333333,
        75.41666667,  81.91666667,  82.08333333,  82.41666667,
        82.5       ,  82.83333333,  83.66666667,  84.41666667,
        84.66666667,  73.16666667,  73.41666667,  73.5       ,
        73.58333333,  73.83333333,  75.5       ,  76.58333333,
        77.58333333,  78.66666667,  69.66666667,  70.58333333,
        70.66666667,  71.08333333,  71.25      ,  71.58333333,
        74.25      ,  74.75      ,  93.75      ,  94.83333333,
        95.08333333,  88.41666667,  88.58333333,  88.5       ,
        89.16666667,  89.83333333,  90.16666667,  90.5       ,
        91.16666667,  91.58333333,  92.25      ,  93.5       ,
        73.25      ,  74.91666667,  75.25      ,  83.16666667,
        83.25      ,  83.33333333,  83.41666667,  83.58333333,
        84.16666667,  80.5       ,  80.66666667,  80.91666667,
        81.75      ,  82.91666667,  84.08333333,  85.25      ,
        71.5       ,  71.66666667,  72.5       ,  72.41666667,
        72.58333333,  56.83333333,  59.08333333,  59.83333333,
        60.25      ,  61.5       ,  61.75      ,  62.08333333,
        57.08333333,  57.33333333,  61.58333333,  56.75      ,
        57.25      ,  57.41666667,  57.5       ,  57.58333333,
        57.83333333,  60.5       ,  54.5       ,  54.83333333,
        55.16666667,  58.08333333,  58.5       ,  54.58333333,
        59.5       ,  60.75      ,  44.16666667,  44.25      ,
        44.41666667,  44.66666667,  44.83333333,  44.91666667,
        46.58333333,  46.75      ,  47.        ,  48.08333333,
        48.5       ,  48.58333333,  48.91666667,  49.        ,
        50.        ,  52.        ,  53.        ,  45.5       ,
        47.08333333,  47.91666667,  48.75      ,  51.        ,
        67.66666667,  45.08333333,  45.16666667,  45.58333333,
        45.91666667,  46.08333333,  46.16666667,  46.5       ,
        46.41666667,  47.33333333,  47.83333333,  47.75      ,
        48.        ,  49.5       ,  49.91666667,  50.5       ,
        50.66666667,  50.83333333,  45.33333333,  45.66666667,
        46.33333333,  47.16666667,  48.16666667,  49.08333333,
        49.58333333,  49.75      ,  49.83333333,  46.66666667,
        46.91666667,  47.5       ,  49.41666667,  49.33333333,
        49.66666667,  53.25      ,  53.33333333,  54.08333333,
        54.41666667,  56.91666667,  58.41666667,  65.75      ,
        93.33333333,  95.41666667,  71.41666667,  72.75      ,
        66.08333333,  47.66666667,  48.33333333,  50.08333333,
        50.33333333,  50.41666667,  50.75      ,  51.33333333,
        51.58333333,  51.75      ,  51.83333333,  52.08333333,
        52.25      ,  52.66666667,  52.91666667,  72.91666667,
        78.41666667,  79.83333333,  82.58333333,  83.08333333,
        83.91666667,  79.5       ,  80.33333333,  80.58333333,
        80.83333333,  81.66666667,  84.25      ,  50.16666667,
        50.58333333,  51.91666667,  52.16666667,  52.41666667,
        52.5       ,  52.58333333,  53.5       ,  53.91666667,
        54.33333333,  54.75      ,  58.83333333,  88.83333333,
        88.91666667,  89.58333333,  90.58333333,  90.83333333,
        91.5       ,  92.5       ,  86.91666667,  87.5       ,
        87.58333333,  87.66666667,  87.75      ,  88.16666667,
        88.33333333,  88.75      ,  89.41666667,  89.91666667,
        90.08333333,  90.25      ,  90.66666667,  90.75      ,
        90.91666667,  91.66666667,  91.75      ,  91.83333333,
        92.83333333,  82.75      ,  84.58333333,  85.5       ,
        86.25      ,  86.75      ,  87.08333333,  87.16666667,
        87.25      ,  87.41666667,  85.33333333,  85.41666667,
        85.66666667,  50.91666667,  52.33333333,  52.75      ,
        53.08333333,  53.16666667,  72.33333333,  71.75      ,
        72.25      ,  70.5       ,  72.66666667,  73.33333333,
        95.33333333,  95.5       ,  95.91666667,  74.08333333,
        48.66666667,  48.83333333,  49.25      ,  50.25      ,
        51.08333333,  51.41666667,  51.5       ,  52.83333333,
        53.66666667,  53.83333333,  48.41666667,  51.16666667,
        53.41666667,  53.75      ,  91.91666667,  95.75      ,
        95.66666667,  42.66666667,  44.08333333,  46.25      ,
        91.25      ,  91.41666667,  88.66666667,  91.08333333,
        70.41666667,  72.08333333,  68.25      ,  70.25      ,
        53.58333333,  69.5       ,  55.33333333,  51.25      ,
        51.66666667,  54.16666667,  68.91666667,  69.16666667,
        54.66666667,  83.5       ,  85.16666667,  86.16666667,
        86.33333333,  86.66666667,  85.91666667,  86.41666667,
        86.5       ,  86.58333333,  89.25      ,  87.33333333,
        89.66666667,  95.58333333,  73.08333333,  75.16666667,
        46.83333333,  86.83333333,  86.08333333,  76.33333333,
        70.75      ,  47.58333333,  48.25      ,  45.41666667,
        49.16666667,  43.33333333,  43.66666667,  43.75      ,
        44.75      ,  46.        ,  45.        ,  45.75      ,
        95.83333333,  96.58333333,  72.16666667,  96.33333333,
        96.75      ,  47.25      ,  47.41666667,  45.25      ,
        45.83333333,  43.41666667,  43.58333333,  43.83333333,
        44.58333333,  96.16666667,  96.5       ,  44.33333333,
        96.08333333,  43.08333333,  43.25      ,  43.5       ,
        43.        ,  44.5       ,  44.        ,  96.66666667,
        97.16666667,  96.25      ,  96.41666667,  43.16666667,
        43.91666667,  97.25      ,  97.41666667,  96.91666667,
        97.33333333,  97.08333333,  97.75      ,  96.83333333,
        97.58333333])
In [51]:
hdb_price_df['remaining_lease'].nunique() 
Out[51]:
661
In [52]:
remaining_lease_13_23 = hdb_price_df_13_23[['remaining_lease','adjusted_price']]
print(remaining_lease_13_23['adjusted_price'].min(),remaining_lease_13_23['adjusted_price'].max())
139610.48674198985 1302877.7242824065
In [53]:
remaining_lease_13_23
Out[53]:
remaining_lease adjusted_price
0 74.333333 NaN
1 74.583333 764484.737287
2 74.666667 683135.212105
3 75.083333 809783.658934
4 75.583333 579285.515947
... ... ...
217186 59.000000 351606.689699
217187 55.000000 355865.392536
217188 66.000000 339813.823214
217189 81.000000 779965.088238
217190 70.000000 482057.867327

213971 rows × 2 columns

In [54]:
remaining_lease_13_23.nunique()
Out[54]:
remaining_lease      658
adjusted_price     54180
dtype: int64
In [55]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# 2004 to 2013
p=sns.regplot(x='remaining_lease', y='adjusted_price', data=remaining_lease_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})

# Label x-axis and x ticks
ax.set_xlabel("Remaining Lease (in Years)", fontsize=14, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Reformat y-axis
p.set_ylabel('Resale Price (SGD$)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1100000])

# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

# Add in line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Remaining Lease Left and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot and Regression Line — How (if any) HDB Resale Price is associated with Remaining Number of Years of Lease", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1070124517.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

Newer HDB Flats (i.e. longer remaining lease) are associated with higher Resale Prices.

Comparing across both 10-year periods:

Not much visible difference between the steepness of graphs showing the relationship between Remaining Lease and HDB Resale Prices across both decades.

</font>

5. Exploring the Relationship between Locational Features and HDB Resale Prices¶

The table below summarizes the locational features of a flat that we will be conducting EDA with — | No. | Feature Name | Values | | :---| :----: | :---: | | 1 | mrt_count_within_1km | 1, 2, 3, ...| | 2 | mall_count_within_1km | 1, 2, 3, ...| | 3 | good_sch_count_within_1km | 1, 2, 3, ...| | 4 | shortest_dist_mrt | 1, 2, 3, ...| | 5 | shortest_dist_mall | 1, 2, 3, ...| | 6 | good_sch_score | 1, 2, 3, ...| | 7 | distance_to_cbd_km | 1, 2, 3, ...| | 8 | matured | 0 or 1|

5.1 Number of Amenities within 1 km¶

Amenities are defined as MRT Stations, Shopping Malls and Good Schools.

In [56]:
import re

fig, ax = plt.subplots(figsize=(16,8), dpi=96)

# Number of MRT stations within 1 km
ax1 = plt.subplot(131)
a=sns.regplot(x='mrt_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_yticks()/1000]
ax1.set_yticklabels(ylabels)
ax1.set_ylabel('Resale Price (SGD$)', size=15)
ax1.set_xlabel('Number of MRT Stations', size=15)

# Number of Malls within 1 km
ax2 = plt.subplot(132)
b=sns.regplot(x='mall_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color = '#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ax2.set_yticklabels(ylabels)
ax2.set_ylabel('', size=15)
ax2.set_xlabel('Number of Malls', size=15)

# Number of Good Schools within 1 km
ax3 = plt.subplot(133)
c=sns.regplot(x='good_sch_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color = '#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ax3.set_yticklabels(ylabels)
ax3.set_ylabel('', size=15)
ax3.set_xlabel('Number of Good Schools', size=15)

# Add in line and rectangle on top
ax1.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax1.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax1.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Number of Amenities within 1km and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax1.text(x=0.05, y=.90, s="Scatterplot & Regression Line — How (if any) HDB Resale Price is associated with Number of Nearby Amenities like MRT Stations, Shopping Malls and Good Schools", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax1.set_yticklabels(ylabels)
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:16: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax2.set_yticklabels(ylabels)
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:23: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax3.set_yticklabels(ylabels)

Observations:

(Relative Comparision)

Number of MRT Stations within 1km vs HDB Resale Price:

  • Strong and postiive correlation

Number of Malls within 1km vs HDB Resale Price:

  • Weak and positive correlation

Number of Good Schools within 1km vs HDB Resale Price:

  • Weak and positive correlation

Comparing across both 10-year periods:

We can see a trend that having more MRT Stations that are nearby HDB Flats, are starting to get significantly more popular in the recent decade as shown in the increase in steepness of the regression line (2013 to 2023 graph). We cannot really say the same for the number of nearby malls and good schools.

</font>

5.2 Distance to Nearest MRT (in km)¶

In [58]:
print(hdb_price_df_13_23['shortest_dist_mrt'].min(), hdb_price_df_13_23['shortest_dist_mrt'].max())
0.0360827902285726 4.080782752857259
In [57]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.regplot(x='shortest_dist_mrt', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})

# Label x-axis and x ticks
ax.set_xlabel("Shortest Distance to nearest MRT (km)", fontsize=14, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Reformat y-axis
p.set_ylabel('Resale Price SGD ($)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1200000])

# Remove the spines and Make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to Nearest MRT and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — how HDB resale prices vary with distance to nearest MRT", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3737170104.py:16: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

General trend is that the further away the HDB is from the nearest MRT, it is associated with a decrease in its resale price.

Surprisingly, some HDBs are more than 4km away from the nearest MRT have a relatively high resale price. Let's see which town(s) these HDBs belong to

Comparing across both 10-year periods:

It is safe to say that all HDBs are within about 4km to the nearest MRT Station in the recent decade, unlike in 2003 to 2012, where there are numerous HDB flats that are as far as more than 7km away from the nearest MRT Station. </font>

In [62]:
towns_more_than_4km_list = hdb_price_df_13_23[hdb_price_df_13_23['shortest_dist_mrt'] > 4]['town'].unique()
towns_more_than_4km = ''.join([str(town) for town in towns_more_than_4km_list])
print('Town(s) with HDBs that are more than 4km away from the nearest MRT Station:\n',towns_more_than_4km)
Town(s) with HDBs that are more than 4km away from the nearest MRT Station:
 MARINE PARADE

Although Marine Parade is still the furthest away from its nearest MRT Station, at least it is not 7km, like 20 years ago.

5.3 Distance to Nearest Shopping Mall¶

In [63]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.regplot(x='shortest_dist_mall', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})


# Label x-axis and x ticks
ax.set_xlabel("Distance to Nearest Shopping Mall (km)", fontsize=14, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Reformat y-axis
p.set_ylabel('Resale Price (SGD$)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1200000])

# Remove the spines and Make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to Nearest Shopping Mall and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — how HDB resale prices vary with distance to nearest Shopping Mall", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/141229708.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

There is a relatively strong and negative association between Distance to Nearest Shopping Mall and HDB Resale Prices. In other words, HDB Flats that are further away from their Nearest Shopping Mall are generally associated with a lower Resale Price.

Comparing across both 10-year periods:

Not much visible difference between the steepness of graphs showing the relationship between Distance to Nearest Shopping Mall and HDB Resale Prices across both decades.

However, there seems to be HDB Flats that are STILL more than 6km away from the Nearest Shopping Mall that have relatively high resale prices. Let's see which town(s) these HDBs belong to and what are the exact HDBs features. </font>

In [64]:
hdb_more_than_6km_from_nearest_mall = hdb_price_df_13_23[hdb_price_df_13_23['shortest_dist_mall'] > 6]
hdb_more_than_6km_from_nearest_mall
Out[64]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease ... good_sch_list_within_1km good_sch_score good_sch_score_sap_gep distance_to_cbd_km matured year CPI adjusted_price flat_type_year blocknum
69221 2014-09-01 MARINE PARADE 5 ROOM 27 MARINE CRES 07 TO 09 126.0 Standard 1975 60.0 ... [] 0 0 23.277334 1 2014 99.553 813636.957199 5 ROOM - 2014 27
69222 2013-06-01 MARINE PARADE 5 ROOM 27 MARINE CRES 04 TO 06 128.0 Standard 1975 61.0 ... [] 0 0 23.277334 1 2013 97.656 832399.442943 5 ROOM - 2013 27

2 rows × 35 columns

In [65]:
towns_more_than_6km_list = hdb_more_than_6km_from_nearest_mall['town'].unique()
towns_more_than_6km = ''.join([str(town) for town in towns_more_than_6km_list])

street_name_more_than_6km_list = hdb_more_than_6km_from_nearest_mall['street_name'].unique()
street_name_more_than_6km = ''.join([str(street_name) for street_name in street_name_more_than_6km_list])

print('Town(s) and corresponding Street Name(s) of HDBs that are more than 6km away from the nearest Shopping Mall:\n Town:',towns_more_than_6km,'\nStreet Name:',street_name_more_than_6km )
Town(s) and corresponding Street Name(s) of HDBs that are more than 6km away from the nearest Shopping Mall:
 Town: MARINE PARADE 
Street Name: MARINE CRES

5.4 Good Schools Score¶

In [ ]:
hdb_price_df['good_sch_score'].unique()
Out[ ]:
array([3, 4, 1, 0, 2])
In [69]:
fig, ax = plt.subplots(figsize=(12, 14), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p=sns.violinplot(x='good_sch_score',
                 y='adjusted_price', 
                 data=hdb_price_df_13_23, 
                 width=1, 
                 palette='Set3',
                 order=hdb_price_df_13_23.groupby('good_sch_score')['adjusted_price'].median().reset_index()['good_sch_score'].tolist())


p.set_xticklabels(p.get_xticklabels(), rotation=0, ha='right')
p.set_xlabel('Good Schools Score', size=15)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price (SGD$)', size=15)

# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

# Add in  line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Good School Score and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=18, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Violin Boxplot — How HDB resale prices vary with different Good Schools Score", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3215674227.py:20: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

The scoring of good schools within 1 km of HDB Flat does not seem to be correlated with its resale price, as seen by the similar distribution of resale prices in the violin plot and corresponding median resale prices.

Comparing across both 10-year periods:

Not much visible difference between the distributions of HDB resale prices for the different Good Schools Score. </font>

5.5 Distance to Central Business District (CBD)¶

In [70]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

# 2004 to 2013
p=sns.regplot(x='distance_to_cbd_km', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})

# Label x-axis and x ticks
ax.set_xlabel("Distance to CBD (in km)", fontsize=14, labelpad=10) 
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)

# Reformat y-axis
p.set_ylabel('Resale Price SGD ($)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1300000])

# Remove the spines
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

# Add in  line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to CBD and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=18, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — How HDB resale prices vary with distance to Central Business District (CBD)", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2485396176.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

Strong and negative correlation between Distance to CBD and HDB Resale Prices. The further the distance of HDB Flats from the Central Business District, there is an association that resale prices are lower and vice-versa.

Comparing across both 10-year periods:

In the recent decade (i.e. from 2013 to 2023), there is a stronger negative correlation between Distance to CBD and HDB Resale Prices compared to the earlier decade (i.e. 2003 to 2012); the steeper gradient in the recent decade implies that HDB Flats that are further away from CBD are associated with a more than proportionate dip in its prices. This suggests the likelihood of changing preferences in HDB owners' when it comes to the proximity of the HDB Flats and Central Business District. </font>

5.6 Maturity of HDB Estate¶

In [71]:
hdb_price_df['matured'].unique()
Out[71]:
array([1, 0])
In [73]:
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)

# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)

p = sns.boxplot(x='matured', y='adjusted_price', data = hdb_price_df_13_23, palette=['#e69798', '#e69798'])
p.set_xticklabels(['Not Matured', 'Matured'], size = 15)
p.set_xlabel('', size=15)

ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('HDB Resale Price (SGD$)')
p.set_ylim([0,1400000])

# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)

ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))

# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Maturity of Estate and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Boxplot — Investigate whether or not HDB Flats in Mature Estates are associated with its Resale Prices", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)

plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3590593329.py:12: UserWarning: FixedFormatter should only be used together with FixedLocator
  p.set_yticklabels(ylabels)

Observations:

Not much visible association between maturity of estates and resale prices. However, the interquartile range of HDB Resale Prices for Matured Estates is larger than that of Non-Matured Estates.

Comparing across both 10-year periods:

In the recent decade (i.e. from 2013 to 2023), Matured Estates tend to fetch a slightly higher Median HDB Resale Prices compared to Non-Matured Estates while Median Prices are more or less the same regardless of the Estate's Maturity for the previous decade (i.e. 2003 to 2012). </font>